Collaborative Ranking between Supervised and Unsupervised Approaches for Keyphrase Extraction

نویسندگان

  • Gerardo Figueroa
  • Yi-Shin Chen
چکیده

Automatic keyphrase extraction methods have generally taken either supervised or unsupervised approaches. Supervised methods extract keyphrases by using a training document set, thus acquiring knowledge from a global collection of texts. Conversely, unsupervised methods extract keyphrases by determining their relevance in a single-document context, without prior learning. We present a hybrid keyphrase extraction method for short articles, HybridRank, which leverages the benefits of both approaches. Our system implements modified versions of the TextRank (Mihalcea and Tarau, 2004)—unsupervised—and KEA (Witten et al., 1999)—supervised—methods, and applies a merging algorithm to produce an overall list of keyphrases. We have tested HybridRank on more than 900 abstracts belonging to a wide variety of subjects, and show its superior effectiveness. We conclude that knowledge collaboration between supervised and unsupervised methods can produce higher-quality keyphrases than applying these methods individually.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming

We introduce a global inference model for keyphrase extraction that reduces overgeneration errors by weighting sets of keyphrase candidates according to their component words. Our model can be applied on top of any supervised or unsupervised word weighting function. Experimental results show a substantial improvement over commonly used word-based ranking approaches.

متن کامل

WordTopic-MultiRank: A New Method for Automatic Keyphrase Extraction

Automatic keyphrase extraction aims to pick out a set of terms as a representation of a document without manual assignment efforts. Supervised and unsupervised graph-based ranking methods have been studied for this task. However, previous methods usually computed importance scores of words under the assumption of single relation between words. In this work, we propose WordTopic-MultiRank as a n...

متن کامل

Re-examining Automatic Keyphrase Extraction Approaches in Scientific Articles

We tackle two major issues in automatic keyphrase extraction using scientific articles: candidate selection and feature engineering. To develop an efficient candidate selection method, we analyze the nature and variation of keyphrases and then select candidates using regular expressions. Secondly, we re-examine the existing features broadly used for the supervised approach, exploring different ...

متن کامل

Approximate Matching for Evaluating Keyphrase Extraction

We propose a new evaluation strategy for keyphrase extraction based on approximate keyphrase matching. It corresponds well with human judgments and is better suited to assess the performance of keyphrase extraction approaches. Additionally, we propose a generalized framework for comprehensive analysis of keyphrase extraction that subsumes most existing approaches, which allows for fair testing ...

متن کامل

State of the Art of Automatic Keyphrase Extraction Methods (État de l'art des méthodes d'extraction automatique de termes-clés) [in French]

State of the Art of Automatic Keyphrase Extraction Methods This article presents the state of the art of the automatic keyphrase extraction methods. The aim of the automatic keyphrase extraction task is to extract the most representative terms of a document. Automatic keyphrase extraction methods can be divided into two categories : supervised methods and unsupervised methods. For supervised me...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014